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Sequential decoding is a technique for encoding and decoding at moderate 
cost with a decoding reliability which approximates that of the optimum, 
and expensive, maximum-likelihood decoder. The several known sequential 
decoding algorithms enjoy a cost advantage over the maximum-likelihood 
decoder because they allow the level of the channel noise to regulate the 
level of the decoding computation. Since the average level of the required 
decoding computation for sequential decoders is small for source rates 
below a rate R BOmp , such a decoder can be realized for these rates with a 
relatively small logic unit and a buffer. The logic unit is normally designed 
to handle computation rates which are less than two or three times the 
average compulation rate; the buffer serves to store data during those noisy 
periods when the required computation rate exceeds the computation rate 
of the logic unit. 

If the periods of high computation, which are caused by noise, are too 
frequent or too long, the buffer, which is necessarily finite in capacity, will 
fill and overflow. Since data are lost during an overflow, continuity in the 
decoding process cannot be maintained. The decoder, then, cannot continue 
to decode without error. For this reason, buffer overflow is an important 
event. In addition, since errors in the absence of overflow are much less 
frequent than are overflows themselves, the overflow event is of primary 
concern in the design of a sequential decoder. 

This paper presents some recent analytical results concerning the proba- 
bility of a buffer overflow. In particular, it is shown that this probability 
is relatively insensitive to both the buffer capacity and the maximum speed 
of the logic unit for moderate capacities and speeds. By contrast, it is shown 
that the overflow probability decreases rapidly with a decrease in the source 

* The resulls of this paper are drawn from the author's thesis which has been 
accepted by the Massachusetts Institute of Technology in partial fulfillment of 
the requirements for the degree of Doctor of Philosophy. The research reported 
here was supported by the M.I.T. Research Laboratory of Electronics and the 
M.I.T. Lincoln Laboratory. 
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rate and is more than squared by a halving of this rate. These sensitivities are 
basic to sequential decoding; they exist because the required computation level 
is large during intervals of high channel noise and grows exponentially with 
the length of such an interval. It is also shown that the dependence of the 
overflow probability on the source rate is intimately related to exponents 
appearing in the coding theorem. In addition, the results presented agree 
with the limited experimental evidence available. 

I. INTRODUCTION 1 

Sequential decoding procedures are important because they achieve, 
at modest cost, a decoding error rate which approximates the error rate 
of the optimum and expensive maximum-likelihood decoder. Sequential 
decoding procedures have this near-optimum performance at modest 
cost because they allow the level of the channel noise to determine the 
level of the decoding computation. The level of the decoding computa- 
tion is a function of the source rate as well as the channel noise and if 
the source rate is held at less than a computational cutoff rate, R 0OInp , 
the computation level on the average will be small. 1216 Thus, a sequential 
decoder may be constructed from a logic unit capable of handling two 
or three times the average computation rate and from a buffer to store 
data during those noisy periods which require a computation rate which 
exceeds that of the basic decoding machine. The maximum likelihood 
decoder, however, always requires a very high computation rate and, in 
effect, is designed to handle the peak noise levels. 

The buffer portion of the decoder stores data during periods of high 
computation and since it has finite capacity, it will fill and overflow if 
the high computation intervals are too frequent or too long. If and when 
a buffer overflow occurs, the decoder cannot continue to decode reliably 
since data which are important to the continuing decoding process are 
lost. Consequently, a buffer overflow forces a halt in the decoding process 
while both the encoding and decoding processes are restarted. 

While errors occur after the onset of overflow, they may also occur 
in the absence of overflow. For a properly chosen code, however, it can 
be argued that errors in the absence of overflow occur much less fre- 
quently than do overflow, themselves. Consequently, it can be argued— 
and, indeed, it is found in practice— that the buffer overflow event is of 
primary concern in the design of a sequential decoder. 

In this paper, we present some recent results 3 concerning the proba- 
bility of a buffer overflow. In particular, we show by upperbounding this 
probability that it is relatively insensitive to machine speed and to the 
storage capacity of the buffer for moderate speeds and capacities. By 



SEQUENTIAL DECODING 151 

contrast, it is shown that the overflow probability decreases rapidly with 
a decrease in the source rate and that this probability is more than 
squared by a halving of the rate. It is found that these sensitivities are 
basic to sequential decoding and arise because the computation per 
decoded digit is large during intervals of high channel noise and grows 
exponentially with the length of such an interval. We show that the 
dependence of the overflow probability on the source rate is intimately 
connected with exponents found in the coding theorem. 45 In addition, 
the results represented here agree with the limited experimental evidence 
available. 

We assume throughout this paper that the encoding and decoding 
are done for a discrete memoryless channel (DMC) characterized by 
the channel transition probabilities 

iPfoltt), l£k£K, l^jgj] 

where .r* represents a letter from the channel input alphabet and /y, 
represents a letter from the channel output alphabet. The results for the 
DMC apply with qualifications to other channels. 

In the following sections we introduce the Fano algorithm, 2 the 
vehicle for this study of sequential decoding. 

II. THE DECODING PROCEDURE 

2.1 Tree Codes 

The Fano algorithm decodes data encoded from tree codes. We assume 
that this data arrives from a source as a sequence of digits and we make 
the assumption that these digits are statistically independent and are 
drawn from the b-letter alphabet, ^4 = {ai , a 2 , • • • , a b \. A sequence 
of source digits drawn from this alphabet is encoded with a tree code as 
follows (see Fig. 1): A branch from the first node of the tree is selected 
which corresponds to the value of the first digit produced by the source. 
The same is true for the second and later source outputs. Thus, in the 
example of Fig. 1, the source sequence (1, 0, 2, • • • ) with letters from 
the alphabet {0, 1, 2} selects the sequence (112, 010, 122, ••• ) from 
the tree. The digits on these branches are then transmitted over the 
DMC. 

We assume that each branch of the tree contains / channel symbols 
so that the source rate in bits per channel transmission is defined as 

R -lo& b/l. (1) 

A variety of rates can be generated with tree codes. 
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Fig. 1 — Tree code. 

A class of tree codes which may be generated with a small amount of 
equipment is known as convolutional codes. Convolutional codes are 
generated with shift registers, multipliers and adders. An example of a 
convolutional encoder is given in Fig. 2 where circles represent multipli- 
cation and addition is taken modulo b = 3. This example generates the 
tree code of Fig. 1. Although we do not restrict the results of this paper 
to convolutional codes, it suffices to say that they can be decoded with 
sequential decoders with a small resultant error rate. 1 - 18 



2.2 The Metric 

The Fano algorithm decodes by comparing the received channel 
sequence to paths in the tree code in search of a path which "matches 
well" to the received sequence. A "match" is measured with a "metric." * 

* This is not a metric in the mathematical sense since it does not satisfy any of 
the rules for a metric. The word metric is used here to indicate the relative match 
(or mismatch) between the received channel sequence and a tree path. 
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Let u a , v 8 represent a tree path and the received sequence, respectively, 
where each has s branches. Then, for the purposes of this paper we 
define the metric between u„ , v a , d(u s , v s ), as 



d( 



U*,Vs) = J112\ l0g2^ — r^; 



Urh) 



r=l h=\ L 



f(Vrk) 



- R 



(2) 



where u rh , v r h are the hth digits on the rth branches of u s , v s , respec- 
tively. P(Vrh/u r h) is a channel transition probability. The function 
(Prh) for v rh = ijj is given as 



/(w) = H PkP(yAxk) . 



(3) 



This function may be viewed as the probability of channel output 
■ijj when the channel inputs are assigned with probabilities \pk\- (The 
function j{y,) and the probability assignment {p*} are chosen because 
they fit naturally into the random code bound to be presented later. ) 

This choice of metric is used because it lends itself to analysis and is 
a metric with which the Fano algorithm will operate. We now study 
this metric and observe from a simple combination of terms in (2 ) that 
d(u s , v s ) is monotonically increasing in increasing P[v a | u a ] which is the 
probability of receiving the sequence v s when the sequence u a is trans- 
mitted. This fact plus the fact that all tree paths with the same number 
of branches are assumed equip robable imply that P[v s | u s ] is propor- 
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Fig. 2 — Convolutional encoder. 



154 



THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY I960 



fcional from Baye's Rule, to P[u s \ v a ], the a posteriori probability of se- 
quence u s given that sequence v a is received. Equivalently, this implies 
that d{u, , T\) is monotonically increasing in the a posteriori probability 
of tree path u„ . Thus, as the decoder progresses into the tree we expect 
d{u s , v 8 ) to increase if u s represents the correct path (see Fig. 3). How- 
ever, if the decoder branches onto an incorrect path, we expect the path 
to decrease in probability (for a properly chosen code) and to see 
d{u„ , v„) decrease (see Fig. 3). Although this behavior is typical, occasional 
noisy intervals will cause the correct path to decrease in metric and 
searching will be required to distinguish it from incorrect paths. 
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Fig. 3 — Criteria and typical paths. 

2.3 The Fano Algorithm 

The Fano algorithm is a set of rules for searching tree paths using 
the metric given by (2).* Since the algorithm is designed to find the 
transmitted tree path, it is programmed to follow a path which grows in 
metric. A path will be said to grow in metric if it crosses an increasing 
sequence of thresholds, such as those of Fig. 3. 

The decoder is also programmed to search for other paths when the 
path being followed begins to decrease in metric and crosses a threshold 
from above. Such a decrease signals the presence of channel noise and 
indicates that searching will be required to distinguish between the cor- 
rect path and incorrect paths. 

The rules governing such a search, as well as the rules for determining 
which path to follow when two or more paths increase in metric, are given 



Other metrics with the same properties will also work. 
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by the flowchart! of Fig. 4. In that chart, a "most probable" branch at a 
node is that branch for which the increase in the metric is largest. 

The "running threshold", which is simply called "threshold" in the 
flow chart, is really a sequence of thresholds which always lies below the 
node being examined in the decoder (see Fig. 5). It is used to determine 
whether the path being extended increases or decreases in metric. In 
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Fin. 4 — Flow chart of the Fano algorithm. 



operations A, B, and C the metric on a node is compared with the run- 
ning threshold. The indicators OK and BAD signify, respectively, that 
the metric is above or below this threshold. 

In operation D, the statement "tighten threshold" means that the 
running threshold is to be increased until it lies just below the value of 
the metric on the node reached by the decoder. Notice that the threshold 
is increased only when a node is reached for the first time. Otherwise 
looping would occur. 



t This chart is based on a chart suggested by Prof. I. M. Jacobs of M.I.T. It 
is equivalent to the flow chart of Ref. 2. 
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Fig. 5 — Threshold reduction, 6=2. 

To further clarify the operations of the flow chart, we make the follow- 
ing observations: (i) Forward searching on a path whose path metric 
continues to grow is performed by operations A and D, (n) the searching 
required after the searched path crosses the running threshold from 
above is performed by operations B, E, C, and D, (Hi) the running 
threshold is reduced only if it is found that all paths cross this threshold 
from above (see Fig. 5). This last observation deserves expansion. When 
the decoder observes that the path under examination violates the run- 
ning threshold, it looks back, one node at a time, to find a path which it 
might extend forward. If, after a number of backward and forward 
moves, the decoder decides that all paths examined violate the running 
threshold, it reduces the value of this threshold and repeats the search 
until a path is found which remains above the new lower value of the 
threshold. (If there is more than one such path, the decoder follows that 
path which has the "most probable" branches.) The decoder then con- 
tinues to extend this path. 

We now go on to discuss a particular buffer design and to examine 
the dynamics of the decoding operation. We shall return to the discus- 
sion of this section in a following section while discussing a random 
variable of computation. 



2.4 Dynamics of the Decoder 

A buffer designed to smooth the delay experienced by data arriving 
at the decoder is shown in Fig. 6. Data arrives from the left, is stored 
in sections corresponding to tree branches and progresses through the 
buffer at the rate at which it arrives. Storage is reserved below each 
branch for tentative source decisions. A safety zone is provided so that, 
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should a buffer overflow occur, data in this section will be declared 
unreliable and not released to the user. 

Two pointers are shown on the buffer of Fig. 6. With these pointers 
the decoder operation may be traced. The "search" pointer locates the 
received branch currently being examined. The "extreme" pointer 
labels the latest branch ever examined. When the channel is relatively 
noise-free the two pointers hover at the left-hand side of the buffer. 
During a noisy interval, searching is required and the search pointer 
drifts to the right and away from the extreme pointer while the extreme 
pointer drifts to the right at the data rate. The two pointers become 
superimposed and move to the left after the noisy period has been passed. 
(We assume that the decoder has a computation rate which is twice or 
three times the average required computation rate.) 

Should the channel experience a severely noisy period, the search 
pointer may drift to the far right-hand side of the buffer at which time 
the decoder will quite probably release an erroneous source decision to 
the safety zone. This spells trouble because thereafter the decoder 
searches on incorrect paths and is likely to do a large amount of con- 
tinuous searching. Additional decoding errors will then be released to 
the user. This event we call buffer overflow. 

Since buffer overflow can be detected by the location of the search 
pointer, the user can be so informed. However, no known techniques 
exist for retrieving the decoder from the overflow state once it has 
entered this state other than a restarting of the decoding process. This 
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implies that cither a feedback link has to be available or that periodic 
restarting is employed. Overflow, then, is a serious event. Since it can 
be argued the probability of an overflow is generally much larger (for 
a properly chosen code) than the probability of an error without overflow, 
it is, therefore, a most important consideration in the design of a se- 
quential decoder. 

In the next section we begin the analytical treatment of the overflow 
probability. Our intent is to indicate the dependence of this probability 
on the encoder and decoder parameters. 

2.5 Static Computation 

The overflow probability, P B f(N), is defined as the probability that, 
the buffer overflows on or before the iVth source decision is released to 
the safety zone. It is this probability which is of primary concern in 
the design of the decoder. Unfortunately, both experimental 7 and 
analytical 3 investigations of Pbf(N) have produced only estimates of 
this probability and these estimates depend upon a heuristic connec- 
tion between P BF (N) and probabilities which have either been deter- 
mined experimentally or bounded analytically. We shall be concerned 
with the analytical bounds and shall present an interpretation of these 
bounds. 

Since Phf(N) is not amenable to direct analysis we shall be concerned 
with a random variable of computation which we call "static" computa- 
tion. This is a computation associated with a node of the correct path. 
We assume that the decoder reaches a node of the correct path, say the 
jyth, and we define static computation, C, as the number of computations 
required on the gth correct node and on all nodes on paths branching 
from this correct node except nodes on the correct path. This set of 
nodes is called the "incorrect subset" associated with the gth correct 
node. (See Fig. 1 where g = 2). A computation on a node is denned as 
a forward or backward "look" from a node (See the flow chart of Fig. 

4). 

The analytical results of this paper are concerned with bounds on the 
cumulative probability distribution of the random variable of static 
computation C, namely, P[C ^ L). We shall determine the behavior 
of P[C ^ L] with the distribution parameter L. Before we do so, how- 
ever, we develop an upper bound to C to be used later in developing an 
upper bound to P[C ^ L]. We begin by labeling nodes in the incorrect 
subset. 

Each node in the 0th incorrect subset can be labeled uniquely with a 
doublet (m, s). We take the index s as a measure of the "penetration" 
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of a node in the incorrect subset. We say a node has penetration s if 
it is separated from the correct node by s branches. The correct node 
itself is at penetration zero. The index m indicates the position of nodes 
at penetration s counting from the bottom of the incorrect subset (see 
Fig. 1 where node (3, 2) is shown). We define M(s) as the number of 
nodes at penetration s and have 1 22 m ^ M(s) where 



M(s) = 



s = 



(b-l)b 11 - 1 s >0 



(4) 



and b is the number of branches at a node. (Note that M(0) = 1 since 
the correct node has penetration zero.) Then, each node is uniquely 
labeled by a doublet (m,s) . 

To develop an upper bound to the random variable C we continue 
the discussion of Section 2.3. Assuming that we have reached the <7th 
node of the correct path, defining D as the smallest value of the path 
metric on the remaining portion of the correct path and letting T D be 
the threshold just below D (see Fig. 7), we see from observation (in) of 
Section 2.3 that no threshold lying below T D is ever used. A lower 
threshold would be required if all paths eventually crossed T D but, by 
definition, at least one path, the correct path, remains completely above 
T D . 

Consider a particular incorrect node (m,s) with metric d + d*(m,s), 
where d is the value of the metric on the path terminated by the 0th 
correct node and d*(m,s) is the remainder. If the thresholds are defined 
by Tt = it , t > 0, — oo < i < oo, and if d 4- d*(m,s) is separated 
from To by k such thresholds (including T D ), then the decoder can 



T, = t 



y t =o 




Fig. 7 — Typical path trajectories and the minimum threshold, Td 
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look Jit node (m,s) with at most k thresholds.! It can be shown that with 
each threshold no more than (6+1) computations can be performed 
at node (m,s), i.e., one backward look and b forward looks from (m,s). 
Since this is the case, we can define a random variable which counts 
the number of thresholds between the value of the metric on node (m,s) 
and threshold T D and we can use this random variable to bound C. To 
simplify the analysis, however, we shall define a random variable z,-,,(m) 
which allows us to overbound the number of thresholds between the value 
of the metric on node (m,s) and threshold T D , including T D . Represent 
the metric on the correct path of length g + r by do + d c (u r , v r ) where 
u r , v r are the portions of the correct and received paths extending be- 
yond the 0th node, respectively. Then, we have the following definition 
for Zi, s (m): 

d*(m } S) ^ Ti-i , d c (Ur , Vr) ^ Tf+i , 

some r ^ (5 ) 

otherwise. 

We now argue that S"=-°° Zi.»( m ) overbounds the number of thresh- 
olds between the value of the metric on node (m,s), do + d*(m,s), and 
threshold T D , including this threshold. Let d , the value of the path 
metric on the correct path up to and including the gth node, be between 
thresholds T and T + to . Then, if the metric on node (ra,s) were T + to 
instead of d , the number of thresholds with which (m,s) would be ex- 
amined would be increased. Similarly, if the path metric on the correct 
path of length g + r, do + d c (u r , v T ), were replaced by T + d c {u r ,v r ), the 
computation on node (m,s) would again be increased. These observations 
are used to define 2,,„(m) in such a way that £"— « 2 *> ( m ) overbounds 
the number of thresholds between (m,s) and T D . 

We have stated that no more than (6 + 1 ) computations are needed 
for each threshold lying between (m,s) and T D . Then, we overbound 
the random variable of static computation on nodes of the gth incorrect 
subset, C, by 

c £ (& + 1) E E"E } *..(«) • (o) 

i=— w 8=0 m=l 

This bound will be used in overbounding P[C ^ L]. 

In Section III the analytical results will be delineated and interpreted 
and the upper bound to P[C ^ L] will be derived. 



t This may be deduced from the flow chart of Fig. 4. 
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III. THE DISTRIBUTION OF STATIC COMPUTATION 

Static computation has been denned as the computation performed in 
a particular incorrect subset of the tree code. A lower bound to the proba- 
bility distribution of this random variable has been obtained, and is 
presented elsewhere. An abbreviated derivation of an upper bound to 
this distribution will be presented in a following section. The essence of 
the lower bound argument is contained in the following section. 



3.1 Behavior of the Distribution 

It has been found 3,7 that the distribution of static computation, 
P[C 2£ L] behaves as L _/J for large L. We shall now present several simple 
intuitive arguments which explain this behavior. 

If noise causes a large dip in the value of the correct path metric in 
the neighborhood of the grth correct node (see Fig. 7) then, the decoder 
will not be able to discriminate between the correct path and incorrect 
paths. Thus, much computation will be required. Since the number of 
paths in the incorrect subset grows exponentially with penetration into 
this subset, the number of computations required will grow roughly as 
an exponential in the length of the correct path dip or the duration of 
the interval of high channel noise. Hence, the static computation grows 
exponentially with an interval of high channel noise. On the other hand, 
an interval of high channel occurs on the DMC with a probability which 
decreases exponentially with its length. It is the balance between these 
two exponentials which is responsible for the behavior of the distribution 
of static computation. Random variables with distributions of this type 
arc known as Paretian random variables and they appear in random 
walk problems, 6 in the distribution of incomes, in error clustering 
on the telephone channels 15 and many other places. 14 

3.2 Random Code Bound on the Distribution 

The technique used in this section to overbound the distribution of 
computation contains two major steps. In the first step, the distribution 
is bounded in terms of the moments of computation using a generaliza- 
tion of Chebysheff's Inequality. In the second step, the moments of 
computation are averaged over the ensemble of all tree codes. Together 
the two steps generate a random code bound to the distribution. This 
argument shows the existence of codes having a particular upper bound 
to their distribution function. The generalized Chebysheff Inequality is 
stated below. 
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Lemma 1: Let C be a positive random variable. Then, 

P[C ^ L] ^ C i /L p , p^O. (7) 

The "tightness" of this inequality is indicated by two examples. 
(i) For the discrete random variable which assumes values and c 
with probabilities 1 — a and a, respectively, the bound is exact when 
L = Co . (ii) For the continuous random variable which assumes values 
greater than or equal to one with probability density 0/c" +1 , the exact 
form of P[C ^ L] is 1/L P and the bound is 0/(0 - p)L p for p < 0. There- 
fore, as p approaches the coefficient in the bound becomes indefinitely 
large while the exponent approaches the true exponent. Since the dis- 
tribution of static computation is Paretian, this same behavior appears in 
the random code bound derived in this section. 

The random variable of computation C has been overbounded by (6 ) . 
It should be clear that moments of the bound on C will be difficult to 
evaluate due to the many crossterms. Much of the difficulty is avoided 
through the use of Minkowski's Inequality 9 which is stated below. 

Lemma 2: Let Xi , x 2 , • ■ • , x n be a set of positive random variables. Then, 
for p ^ 1 and for every n we have 

■up 

JL. — i/p 

^ E *i v ■ (8) 



ON' 



Applying this inequality to the bound on C we have 

■up 



.,„ » °° /AfU) \7 

c '* (& + DEE E ZiAm)) 

j=0 a =0 \m=l / 



+ (6 + DEE E z-iM)) 

f=0 s=0 \m=l / 



(9) 



In this form, moments are taken of the sum of the random variables 
Zi, s (m), 1 ^ m ^ M(s), with both the threshold 7\ and the penetra- 
tion s fixed. (See the definition of z ita (?n) in (5).) 

To further bound (Jp we make the following expansion for integer 
values of p where the indices i and s are omitted : 



(M(a) \p M(s) Af(a) 

E «(m)J = E • • • E zimMm) ■•■ z(m p ). (10) 
m=l / mi=l n» p =l 

Since such an expansion does not hold for noninteger p, we limit our 
attention hereafter to integer p. We now proceed through several count- 
ing arguments to put (10) in a manageable form. 
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Since the random variable z(m-i) z(m->) • ■ • z(m p ) assumes the value I 
only when all implied events occur simultaneously, the expectation in 
the right-hand side of (10) is the probability of the joint occurrence of 
all implied events. Now it can be seen for p = 4, say, that 

z(5)z(l)z(16)z(5) = z(lQ)z(5)z(5)z(l) = z(l)z(5)z(16) 

since z(-) = 1 or (so that 2(5)2(0) = z(o)). Hence, 2 (mi) • • ■ z(m p ) 
is independent of the order of the m, and equals z(d x ) • ■ • z(6 t ) where 
0i , • • • , 81 are the distinct elements among m^ , m 2 , • • • ,m P . Conse- 
quently we can write 

-WU) M(») 

£ ■ ■ • 2 z(mi) ■ ■ ■ z(m p ) 

n.,-1 m„-l 

Mint.VOO.P) (11) 

E E ir(/,p)2(0,) ••■ z(e,) 

l = \ All sets of t 

distinct elements 

where W(t,p) is the number of p-tuples (w x , m 2 , • • • ,m p ) which con- 
tain t distinct elements. We now bound W(t,p). 

W (t,p) may be viewed as the number of ways of placing one ball in 
each of p distinguishable cells where the balls are of / different colors 
and each color must appear at least once. The number of such collec- 
tions of p balls is less than the number of collections one would have if 
we include the situations where one or more colors do not appear. This 
larger number is the number of ways of placing / different elements in 
each of p distinguishable cells, or t p . Therefore, W(t,p) ^ f. 

To underbound W (t,p) we now establish that W (t,p) ^ tW(t,p - 1). 
Consider \V(t,p — 1), the number of ways (p — 1) balls of t different 
colors may be placed in (p — 1 ) distinguishable cells with no cell empty. 
Consider extending the collection by placing one additional ball with 
one of the / colors in a pth cell. This new collection contains tW(t,p — 1 ) 
items. It cannot contain more items than does the collection of W (t,p) 
items because one color appears at least twice and every other color at 
least once, establishing the desired inequality. Iterating this inequality 
(p - t) times and observing that W(t,t) = t\, we have W(t,p) ^ t"~'tl. 
The two bounds are summarized in the following Lemma : 

Lemma S: For I ^ p we hare 

V^te^'f < W(t,p) g f (12) 

* The upper limit on / indicates that (m t , •••, m P ) contains no morn than the 
smaller of M(x) and /> different elements. 
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Proof: We use the fact 6 that 

t\ ^ t l V2irt e~ l . Q.E.D. 

The two bounds indicate that W(t,p) grows with t primarily as f for 

large p. 

Before we proceed to the next counting argument we motivate our use 
of this next argument by presentin g a result which is too long to be derived 
here. 3 The probability z,-. 8 (0i) • • • *,-..(0i) is the probability that nodes 
(0i , s), ■ ■ ■ , (0« , *), all at penetration s in the gth incorrect subset, 
simultaneously lie in metric above threshold 7\_i = (/ — 1) / while 
the correct path falls below 7\+i somewhere following the <?th correct 
node (sec (5)). An overbound to this probability is given below. The 
average of the product of the z's is taken over the ensemble of channel 
transitions and the set of all tree codes. It is at this point that the random 
code technique is used. 

nlill/U+t)] (*(-*)] . ^-"'"ij J V 2 -r °" anR- ' 1 ' (,,(l)l I 

U=i J ' 

Here a is the number of branches on the paths terminated by nodes (6 \ , s), 
■ ■ ■ , (d t , s), exclusive of branches preceding the 0th correct node, and 

/(v/)£EflfcP<»l*i) 

k=i 

where a Q ^ 0. Also, 

R t ^ - I log 2 1 (t P*P(vA*») V1+t Y i ■ (15) 

t y=i \i=i / 

The probability assignment \p k \ is the assignment given to digits in a 
code when using the random code argument. In the bound of (13) there 
exists a value of <t , —(0/(1 +0 < <*> ^ 0, for which the sum on r 
converges, as long as R < R t . Examination of (13) will show that the 
bound depends on the paths terminated by nodes (0 X , s), (02 , s), • ■ ■, 
(d t , s) only through a, the number of branches which they contain, 
exclusive of branches preceding the #th correct node. (For example, see 
Fig. 8 where a set of paths is indicated with checks and the branches 
which they contain are labeled with 1.) This being the case, we must 
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Fig. 8 — Topology of tree paths. 



(hen group together in (Hi) all paths having the same number, a, of 
branches. Call this number of paths N t (a). The following lemma pro- 
vides a bound on N t (a). 

N t (a) * ft- !)!(•- I)'" 2 " 



Lemma 4: N,(a) ^ (t - l)!(s - 1) *2 

Proof: The proof is by construction. We first show that 

N t (a) ^ (t - l)ls'~' 2 2 a " 1 for s ^ 1. 



(10) 



Consider placing the / paths in the tree, one by one. The first of the / 
paths placed in the incorrect subset of the tree (containing M(s) ^ b' 
paths) may assume no more than b 8 positions. A second path connecting 
with the first, but having d\ separate branches, may assume any one 
of b l positions since its point of connection to the first path is fixed 
by its length di . A third path with d 2 branches distinct from the first 
two may connect to either path and terminate in b di positions, that 
is, it can assume no more than 2 b d - places. The *th path having 
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d t -i branches may terminate in any one of b ' _1 positions; hence, can 
be situated in no more than (t — 1) 6 d ' _1 places. Thus, given that the 
second path hasdi branches distinct from the first, that the third path has 
d 2 branches distinct from the first and second, etc., the number of ar- 
rangements of the I paths cannot exceed (t — l)lb a where 

a = s + di + • • • + dt-\ , 

the number of branches on these paths. All that remains is to determine 
the number of ways that values may be assigned to di , d 2 , • • • , d t -2 . 
(Note that d t -\ is fixed given a and di,dt, ••• ,dt-t.) Since each 
number di represents a portion of a path, 1 ^ d, ^ s, we may assign 
values to rfi , rf 2 , • • • , d t - 2 in no more than s'~ 2 ways. Hence, the number 
of arrangements of t paths containing a branches cannot exceed 
(t — l)!s'~V. Observing that 6 = 2 ,R by (1) we have the desired re- 
sult for s^l. We also have s ^ a ^ st since one path contains s branches 
and the number of branches on all paths cannot exceed st. Now, when 
s = 0, the bound on N t (a) is zero. We must have N t {a) = 1 when s = 0, 
so that we replace s by (s + 1). Q.E.D. 

Combining all terms we have the following random code bound on 
the moments of static computation : 



-prill' 



'J (. + 1)— 2 exp ( -;{[\- p ^ )] (2"' (l -*V) 
X {(i; 2 exp { -rlWB - h^m) 1 ' [g 2 



Up 



exp 



V 



(17) 



+ (jt 2 exp { - rl[a'R - v,W)\)j* [g 2 

cxp( + ^ /(1 ^ )+ ^ )]}. 

It can be shown 3 that a and a can be chosen for R < R p such that 
a > -h * < ~(p)/(l + V), & ~ lh(*) > and aR - » v (<r') > 0, 
that is, such that the bound converges for R < R p . 

We now have enough results available to draw the central conclusion 
concerning moments of the random variable of static computation, C. 
For integer p, C*>, as an average over the ensemble of all tree codes, is 
finite for source rates R strictly less than the rate R p (given by (15)). 
Since it can be shown that Ri ^ R2 ^ Rs ^ ■ • ■ ^ 0, we have for any 
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given rate R that moments C, C 2 , ■ ■ • , C k are bounded where k is the 
largest integer such that R < Rk . We cannot determine from this 
bounding argument whether moments C k+1 , C k+2 , etc. are finite or not. 
Returning to Chebysheff's Inequality we overbound the cumulative 
probability distribution of the random variable of static computation, 
P[C ^ L], using the bounds on the moments. 

Theorem 1 : With a -probability of at least 0.9, a tree code drawn from the 
ensemble of tree codes will have a distribution of static computation, 
P[C ^ L], which is bounded by WC k /L k where k is the largest integer such 
that R ^ Rk , R is the source rate, R k is given by 

■. J / K \l+k 

Rk = - lo g2 £ ( £ vtPtm | x k ) m+k ) (18) 

and C k is the random code bound on the moments of static computation. 
For any larger k a finite bound on C k is not known. 

Proof: Over the ensemble of tree codes P[C ^ L] is a random variable. 
Then, if we let x represent P[C ^ L], we have 

x ^ £ xp(x) ^ 10vP[x ^ 10.f] 

from which we have that P[x < lO.f] ^ 0.9. 

Q.E.D. 

This theorem summarizes the major result of this section which is 
that the distribution of computation decreases as fast as L where k 
is the largest integer such that R < R k . We note that C k becomes in- 
definitely large as R approaches R k . This was predicted by the discus- 
sion winch followed the introduction of Chebysheff's Inequality. 

In the next section we interpret the bounds on the distribution of 
static computation and relate these bounds to the probability of a 
buffer overflow. 

IV. STATIC COMPUTATION AND THE OVERFLOW PROBABILITY 

The upper bound to the distribution of computation given above and 
a lower bound presented elsewhere' 1 both are algebraic functions of the 
distribution parameter, that is, P[C ^ L] behaves as L , /3 > 0, for 
large L. In the following sections we define a quantity called the "com- 
putation exponent" which extracts from P[C ^ L] its behavior with L. 
The computation exponent is compared to known exponents on the 
probability of error and with some experimental data. Also, a heuristic 
connection between the overflow probability and the distribution of 
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static computation is established. We begin with a discussion of the 
computation exponent. 

4.1 The Computation Exponent 

The "computation exponent," e(R), as denned below, is a measure 
of the tail behavior of P[C ^ L], that is, its behavior with L for large 
L. 

«<fi) A B ( lim - JgiZp » ) , , ( 19 ) 

U-oo log L J 

If P[C ^ L] behaves as L' 13 for large L, then = e{R)/R. We con- 
sider the computation exponent e(R) rather than /3 because e(R) is a 
bounded function while /3 is not. We now state bounds that have been 
obtained on e(R) by over- and under-bounding P[C ^ L]. We note 
that a channel is "completely connected" if all of its transition proba- 
bilities are nonzero, i.e., all output symbols can be "reached" from 
every input symbol, the normal physical situation. 

Theorem 2: Codes do not exist for the completely connected DMC which 
have a computation exponent greater than e (R ) where 

e{R)± i-a)(R ~ /min) (20) 

and <j, — 1 ^ o- ^ 0, is the solution to 

B = max^. (21) 

Here, 7*(<r) is given by 



and 



/nun 4 min log 2 ^|% } 



**> -£»*(£)• 



(22) 



Theorem 3: On the DMC, codes exist which have a computation exponent 
greater than or equal to e (R ) where 

e(R) = pR (23) 

and p = 1,2,3, ••■ , is found, from R p+ i ^ R < R p . R P is given by (18). 
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The probability assignment {pit} appears in the random code argu- 
ment and in the definition of the metric through the function f(yj). 
Although this will not be done here, one can choose {pit} to maximize 

e(R). 

As an example, the two bounds are sketched in Fig. 9 for the Binary 
Symmetric Channel (BSC) with crossover probability of p = 0.01. In 
that figure we have chosen p k = \,k = \,2. For this probability assign- 
ment e{R) is zero for R greater than or equal to channel capacity. For 
other assignments e(R) may intercept the rate axis at a rate which ex- 
ceeds channel capacity. 

4.2 An Experimental Result, A Conjecture and An Interpretation 

Recently a computer simulation was made of the Fano algorithm for 
a number of channels including the BSC. This simulation study 7 was 
performed at the M.I.T. Lincoln Laboratory under the direction of 
K. L. Jordan. Mr. Jordan has generously provided the author with 
data from a particular simulation of a BSC with crossover probability 
of po = 0.01. 

In this simulation, a convolutional tree code of the type described in 
Section 2.1 with b = 2 was used (hence; source rates of R = l/l, I an 
integer were available). The generator for this tree code was optimized 
in the manner found in Ref. 1. 




Fig. 9 — Computation exponent bounds for the BSC with p — 0.01. 
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An empirical distribution of a particular random variable of compu- 
tation was measured. This random variable differs from static compu- 
tation somewhat but one can argue heuristically that it is within a small 
multiple of the random variable of static computation when either 
random variable is large. The random variable measured in the simula- 
tion is the number of computations required to advance one node into 
the tree. For example, when the channel is not noisy, a forward "look" 
will indicate that a forward move is possible and only one computation 
will be necessaiy; however, if the channel is noisy, the decoder may 
have to do much backward searching before a path is found upon which 
the point of deepest penetration into the tree can be increased. 

The empirical distribution of computation is shown in Fig. 10 for 
R = \. The corresponding computation exponent for this rate is shown 
in Fig. 9. The data from which the empirical distribution was deter- 
mined represents the transmission of over one million channel digits. 
Although data at rates R = §, f was available, it was not deemed re- 
liable and not used because few cases of large computation occurred. 

The experimental computation exponent and the derivation of the 
lower bound to e{R), namely e(R), leads one to conjecture a "true" 
value for the computation exponent. 
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Fig. 10 — Empirical distribution of computation. 
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Conjecture:^ For the metric used in this paper the computation exponent 
for the Fano algorithm cannot exceed e*(R) and there exists codes 
with this computation exponent where e*(R) is given by 

e*(R) = pR p for R = R p (24) 

Here p assumes all nonnegative, real values and R p is given by (18). 
Here we use that probability assignment { p k } , for each p that maximizes 
R p . 

The conjectured exponent for the BSC is shown in Fig. 9. The ex- 
perimental point and e*(R) at R = \ differ by 5 percent, an excellent 
match. 

The conjectured exponent has an interesting interpretation in terms 
of the probability of error with "List Decoding." 10 ' 11 With list decoding, 
the decoder makes a list of the k a posteriori most probable codewords. 
If this list does not contain the transmitted codeword, an error is said 
to occur. Random code bounds have been obtained on this probability 
of error. This probability of error has an exponent which we call E k (R) 
for list size k (see Fig. 11). E k {R) may be found from an exponent 
E M (R) by E k {R) = Eoo(R) for R k * ^ R where R k * is the rate at which 
E a (R) has slope -k and for R < R k *, E k (R) is the tangent to Eco(R) 
at R = R k *. The rate-axis intercept of this straight line is R k . 

The "list decoding exponent", E K (R), depends on the random code 
assignment probabilities {p k \. If that set {pn} is chosen for each rate 
which maximizes Eto(R) } we have the "sphere-packing" 13 exponent. 
(See Fig. 11.) This is an exponent on the probability of a block decoding 
error which cannot be exceeded by any block code with any decoding 
algorithm even when a feedback channel is available. Thus, the "list 
decoding exponent" and the "sphere-packing exponent" are funda- 
mental. 

The conjectured computation exponent, e*(R), can now be related to 
E M (R). To find e* (R) draw a line from R on the rate-axis which is 
tangent to Eao(R). (See Fig. 12.) This line intersects the exponent axis. 
The rate-axis intercept and the exponent-axis intercept define a point 
on e*(R). Using this construction procedure every point on e*{R) may 



t Note added in the preparation of this paper: Recently I. M. Jacobs and E. 
Berlekamp have underbounded the probability of buffer overflow or undetected 
error using lower bounds to the probability of error with list decoding. They have 
found that this bound has a computation exponent which agrees with the con- 
jectured exponent given above and have shown that this bound grows linearly 
with the number of source digits processed by the decoder before overflow. Also, 
H. Yudkin has recently upper bounded the moments of static computation for 
integer and noninteger moments. The computation exponent implied by these 
bounds establishes the conjecture. 
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Fig. 11 — List decoding exponent. 

be generated. Hence, e*(R) may be obtained by a simple and natural 
construction from E„(R). 

4.3 Heuristic Connection With The Overflow Probability 

In this section we establish a heuristic connection between P[C ^ L] 
and the probability of a buffer overflow P B f(N), the probability of an 
overflow before the iVth source decision is released to the safety zone. 

We begin by noting that Pbf(N) is monotone increasing in N, hence 
that P B f(N) ^ P B f(1). We first relate P B f(1) and P R [C ^ L]. 

Referring to Fig. 7, an example of a correct path trajectory which 
causes large static computation, we develop the following argument. 
If the random variable of static computation, C, is large, most of the 
computation will be performed on nodes which are close to the reference 
node. For computation to be performed on nodes distant from the 
reference node, the correct path must dip sufficiently at some distant 
point so that it returns at least to the level of the reference node. (Incor- 
rect paths typically decrease in metric.) Since the correct path increases 
on the average, such a dip must be very large and thus occurs with 
much smaller probability than do dips close to the reference node. 

If most of the static computation is done on nodes close to the refer- 
ence node, we may associate such computation with dips in the correct 
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path. Then, since all incorrect subsets in the neighborhood of a path 
dip will have approximately equal amounts of computation done in 
them, the total computation due to a correct path dip will be a small 
multiple of the static computation in a particular incorrect subset in 
the neighborhood of the dip. Consequently, if about N av incorrect sub- 
sets have equal computation in them, say C , and N av C is enough 
computation to cause overflow on the first decoded digit, then, heuris- 
tically, P B p(l) = P[C ^ Co]. To find C , assume that the buffer can 
store B tree branches of I digits each, that each channel digit arrives in 
r c h seconds, and that the decoder can perform one computation in t 
seconds. Then, the search pointer will be forced back to the safety zone 
if more than IBt c h/t computations are needed to decode the first digit. 
Setting NavCo = IBt^/to we have 



P BF (l) ^P[C ^ lB Tch /N av T ] 



(25) 



as our heuristic approximation. 

If dips in the correct path are infrequent and if the decoder operates 
at about twice or three times the speed required to do the average 
computation in real time, then we may approximate P B p(N) by assum- 
ing independence of the dips which cause overflow and have 

P BF (N)^NP BF (l) (26) 

This last argument is weak and should, at best, serve as a rule of thumb. 



e-(R) 




Fig. 12 — Construction for e*(R). 
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The argument leading to the connection between P B f(1) and the dis- 
tribution of static computation is stronger and is partially supported by 
the experimental evidence cited above. 

From (24), (25), (26), and the fact that the distribution of static 
computation is Paretian, we deduce that the overflow probability, when 
it is small, behaves as N[N„To/lBT e hf where p is related to the compu- 
tation exponent e(R) by p(R) = e(R)/R. Thus, it is clear that the 
overflow probability is relatively insensitive to the buffer size B and 
the machine speed 1/t but that it depends heavily on the source rate 
R. (Note that changing the duration of the channel symbols, t c h , is 
tantamount to changing the channel and thus e(R).) Since e(R) in- 
creases with decreasing rate, we deduce that @(R) is more than doubled 
by a halving of the information rate or that the overflow probability is 
more than squared. 

V. CONCLUSIONS 

It has been said that the buffer overflow probability is of primary 
concern in the design of a sequential decoder. We have examined this 
buffer overflow probability and have shown that it is relatively insensi- 
tive to the buffer capacity and to machine speed for moderate speeds 
and capacities. We have also indicated that the overflow probability 
is a strong function of the source rate. In addition, bounds on the de- 
pendence of the overflow probability on the source rate were given and 
related to exponents presented in the coding theorem. 

We have argued that the particular sensitivites of the overflow prob- 
ability exist because the distribution of static computation is an algebraic 
function of the distribution parameter; i.e., P[C ^ L] behaves as Ir & , 
/3 > 0, for large L. In turn, it has been observed that such behavior 
arises because the random variable of static computation assumes ex- 
ponentially large values with exponentially small probabilities. This 
exponential growth of computation has been shown to be basic to se- 
quential decoding. 

While the probability of overflow is relatively insensitive to many of 
the machine parameters, these parameters can be so chosen and the 
source rate can be so restricted that the probability of a buffer overflow 
can be made very small. To achieve the small overflow probabilities, the 
source rate for many channels need not be restricted to be less than 
about 90 percent of Ri .' (R t is generally known as R comp , the computa- 
tional cutoff rate defined by Wozencraft. 1 For many channels, R comp is 
a substantial fraction of channel capacity.) 
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